Fiber laser development enabled by machine learning: review and prospect

2024-04-14 15:28| 来源: 网络整理| 查看: 265

This section first introduces the concept of machine learning, followed by the learning algorithm taxonomy, and emphasizes a widely adopted algorithm, artificial neural networks (ANNs).

Concept

The field of machine learning and optimization are intertwined. Most machine learning problems can transform into optimization ones in the end. Some researchers put several works with purely adaptive and robust optimization algorithms into the category of machine learning, for example, evolutionary algorithms, typically genetic algorithms, for coherent control of ultrafast dynamics [30], intelligent breathing soliton generation [31], and self-tuning mode-locked fiber lasers [32]. More common definitions of machine learning emphasize “learning” and “to gain knowledge” from data, and a classical one of them is “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E” [33]. Generally, experience is usually presented in the form of data in tasks, and learning algorithms are methods of generating models from data. With the learned model, the machine can make a prediction or take actions in tasks. Obviously, datasets, models, and learning algorithms are three core elements of machine learning.

The collection of data from experiments or numerical simulation of specific tasks is called a dataset, marked as D = {(xi, yi)}i = 1,2,…,N, where (xi, yi) is an example and N is the number of examples. xi is a property description of an example, usually named as ‘sample’ or ‘feature vector’. For example, xi = {xij}j = 1,2,…,d is a feature vector with dimensionality d, where each dimension contains a value xi j that describes the example somehow. yi is the label of xi, which can be the form of one of a finite set of classes, a vector, a matrix, a graph, or others. In some tasks, yi may not exist. Training (also known as learning) is the process of using data to generate models through learning algorithms. The undetermined parameters of the model would be modified during the training. Therefore, the model can be regarded as the parameterized representation of the learning algorithm on the given data and model parameter space. The data used in training is called a training dataset. Sometimes, a validation dataset is split proportionally from the training dataset to show the performance of the model during the training process. After training, the model needs to be tested on an independent dataset from the same or similar statistical distribution to the training dataset, the testing dataset, to evaluate its generalization applicability for new data. Figure 1 shows the general working framework of machine learning, including data preparation, algorithm selection, training, and test.

Fig. 1

The working framework of machine learning

Full size imageLearning algorithm taxonomy

Machine learning covers a very broad field, and it has developed a variety of learning algorithms to handle different types of learning tasks. We describe four rough classifications of machine learning algorithms. In different tasks, the available data have different forms, labeled or unlabeled, research object itself or only a metric value of it. According to the form of data algorithms used, machine learning can be divided into supervised, unsupervised, semi-supervised, and reinforcement learning (RL) [34,35,36,37,38,39,40]. The data for supervised learning is labeled, that is, D = {(xi, yi)}i = 1,2,…,N. With the difference between actual label yi and model output, the model parameters can be iteratively modified to map the label better. Supervised learning aims to find a mapping f: χ → y, where xi∈χ (sample space), yi∈y (label space), D∈χ × y, so that f (xi) = yi. Typical supervised learning problem includes classification and regression. Unsupervised learning specializes in learning the internal representation or potential relationships or structures of samples without labels, where D = {xi}i = 1,2,…,N. Clustering and dimensionality reduction are two common unsupervised learning problems. Semi-supervised learning adopts partially labeled datasets, D=D1 + D2, D1 = {(xi, yi)}i = 1,2,…,N and D2 = {xi}i = 1,2,…,M, where M> > N. Reinforcement learning attempts to learn what to do and how to map situations to actions to maximize a reward function [41]. To some degree, deep reinforcement learning is a control strategy that does not require accurate object models because it can adapt to the environment via interacting [42].

Machine learning algorithms can be classified according to the learning tasks, such as classification algorithms, regression algorithms, clustering algorithms, and dimensionality reduction algorithms. For example, principal component analysis and manifold learning are popular dimensionality reduction algorithms. Some learning algorithms can work for not only one one kind of task, like support vector machines for classification and regression tasks [43] and ANNs for almost all machine learning tasks [44,45,46,47,48].

Depending on whether physical knowledge is involved, machine learning algorithms can be categorized as physics-based and physics-free. Physics-free machine learning is a purely data-driven method. The core is data-driven modeling, extracting hidden physical and mathematical models from available system data and representing them by learned models [49]. Unlike physical and mathematical models represented by explicit equations, data-driven models belong to empirical models that can be a universal functional approximator, acting as a black box that allows people to solve problems without professional background or expertise. Generally, physically-free machine learning requires big data for training and is not available for specific tasks where data acquisition costs are prohibitive. By contrast, physics-informed machine learning integrates data-driven modeling and prior knowledge [50]. For example, a physics-informed neural network (PINN) is designed to satisfy some physical constraints automatically, improving accuracy and enhancing generalization in small data regimes [51]. In some cases, prior physical laws can act as a regularization term that constrains the space of admissible solutions to a manageable size, enabling it to steer itself towards the right solution and converge quickly [51,52,53].

Machine learning algorithm features shallow or deep architecture. The performance of machine learning methods is heavily dependent on the choice of data representation (or features) [54]. In the early stage, machine learning works with shallow architectures, for example, hidden Markov model, maximum entropy models, conditional random fields, and perceptron or ANN with a single hidden layer [55]. They all have a few nonlinear feature transformations, resulting in a limited ability to extract features from raw data and requiring expertise in engineering for design [56]. In recent years, deep learning (DL) under deep architectures represented by various deep neural networks (DNN) has become a hot subfield of machine learning. Deep learning shows amazing power in discovering intricate structures in high-dimensional data by transforming raw data into more abstract and ultimately more useful representations through multiple simple but nonlinear models [56].

Artificial neural networks

Here, we provide more information about ANN because of its notable impact on fiber laser research. ANN is a mathematical model that imitates the structure and function of biological neural networks, which is usually used to estimate or approximate functions [38]. ANNs consist of three types of layers: input, hidden, and output. Each layer consists of many processing elements, known as neurons or nodes, which have a bias (or called threshold) b and an active function f that is usually nonlinear (such as the softmax, relu, and sigmoid). According to the McCulloch-Pitts (MP) Model [57], when node j in the network has n inputs, and xi (i = 1, 2, …, n) notes the ith inputs with interconnection weight wij, the output of node j is \({y}_j={f}_j\left({\sum}_{i=1}^n{w}_{ij}{x}_i-{b}_j\right)\), where bj and fj means the bias and activation function of node j. Plenty of nodes are arranged in a certain hierarchical structure to form a network.

The architecture of an ANN can be classified by its topological structure, i.e., the overall connectivity and active function of nodes. ANNs can be divided into feedforward and recurrent classes according to their topological connectivity structure. Feedforward neural network is the most common network with a unidirectional multilayer structure, where data flows from the input to the hidden layer and then to the output layer. The simplest feedforward neural network is the fully connected network (FCNN), the nodes in each layer are connected with all the nodes in the last layer. The recurrent neural network (RNN) is developed mainly to process sequence data, the feature of which is that the current output is related to the previous output, for example, video and text. RNNs will memorize the previous information and apply it to the calculation of the current output. The input of the hidden layer includes not only the output of the input layer but also the output of the previously hidden layer. Theoretically, RNNs can process sequence data of any length. However, in practice, to reduce complexity, it is often assumed that the current state is only related to the first few states. Mainstream RNNs are long short-term memory (LSTM) and gated recurrent unit (GRU) [58] (Fig. 2).

Fig. 2

Architectures of artificial neural network

Full size image

The training process of the ANN is to determine these weights with search operators. Optimization is the core of the training, and most machine learning problems boil down to optimization problems [59]. In practice, a great variety of gradient descent algorithms, for example, stochastic gradient descent (SGD) algorithm, Adam, AdaGrad, RMSProp [60,61,62], combined with the backpropagation algorithm, are used to train ANNs. The working details of the backpropagation are similar to the chain rule for derivatives [56]. In recent years, in addition to the gradient descent algorithm, there has been a great interest in combining learning with metaheuristics optimization algorithms, like evolution algorithms [63,64,65] and simulated annealing algorithms [66].

ANN with a multilayer structure rather than a single hidden layer is expected to yield a better learning ability. However, the weights of multilayer networks are difficult to optimize because of gradient diffusion (Gradient Diffusion). As the number of network layers increases, this situation will become more serious. The existence of these problems restricts the development of multilayer networks. In 2006, Geoffrey E. Hinton et al. proposed improved training methods for deep architectures, which is regarded as the beginning of deep learning [67]. Nowadays, DNNs, a FNN with more than one hidden layer [16], is still the mainstream deep learning framework. Popular DNNs include restricted Boltzmann machine (RBM), deep belief network (DBN), and convolutional neural network (CNN). Studies that exploit supervised, unsupervised, and semi-supervised learning have developed various architectures like autoencoder (AE), generative adversarial network (GAN), variational autoencoder (VAE), and graph convolutional network (GCN) [68]. Besides, the deep Q-network (DQN) is a representative algorithm in deep reinforcement learning, trained with a variant of Q-learning [69].

Learning-enabled fiber laser

We first analyze typical problems in fiber lasers and then explain what machine learning can do for them.

The learning problems in the field of the fiber laser can be divided into identification (learning the input-output prediction model), estimation (learning how to characterize unmeasured parameters, such as reconstructed inputs, predicted theoretical outputs, and inferred evaluation metrics of outputs), design (learning how to obtain the target), and control (learning the control law). In practice, these problems are interrelated. For example, the identified prediction model can help solve estimation (including prediction, reconstruction, and evaluation), design, and control problems. For the convenience of description, a general formulation of data relationship is considered, y = Ax, where x and y are the input and corresponding output of the fiber laser system, A is the forward operator or transfer function of fiber or fiber laser setup, which describes the explicit relationship (e.g., physical principles and rules) or implicit relationship (without enough physical knowledge) between the input x and output y. Sometimes, some special terms are considered, such as Δx, the disturbance of input coming from the environment, n, noise included in the output, and E(y), an evaluation function of output y. Table 1 and Fig. 3 illustrate the typical problems in the fiber laser systems.

Table 1 Typical problems in the fiber laser systems (*means a specific value)Full size tableFig. 3

Typical problems in the fiber laser systems

Full size imagePrediction

Machine learning has demonstrated an outstanding system identification ability to reproduce physical models by identifying hidden structures and learning input-output functions based on data analysis, which even can distill theories of dynamic processes, transforming observed data into predictive models [53]. For example, recurrent neural networks are influential in successful applications because of their ability to represent sequential dependent data, such as forecasting the spatiotemporal dynamics of high-dimensional and reduced-order complex systems [70], modeling the large-scale structure and low-order statistics of turbulent convection [71], and inferring high-dimensional chaos [72]. In the fiber laser field, nonlinear dynamic systems described by nonlinear partial differential equations (PDEs), e.g., the nonlinear Schrödinger equation (NLSE), usually have no analytical solutions. Numerical methods and related calculation strategies are studied for numerical solutions. There is a strong interest in finding a data-driven solution through machine learning. In recent years, machine learning has shown power in predicting complex nonlinear evolution governed by NLSE [73,74,75]. PINNs guided with specific theories can also be an effective analytical tool to solve PDEs from incomplete models and limited data [76].

Reconstruction and design (inverse problem)

The inverse problem in fiber laser fields can be divided into two categories. The first one is the reconstruction problem: recovering the x* from measurement data y*, where y* = Ax* + n, for example, pulse reconstruction from a speckle pattern through a multimode fiber, mode decomposition from measured intensity patterns. The noise n might be an obstacle to achieving a high-precision reconstruction. The second is the design and manipulation problem: given a specific design target y* (e.g., a gain profile), to determine the required input of fiber laser system x* (such as the input voltages, currents, powers, and wavelengths), or the laser system itself A*(e.g., fiber with specific structures) where y* = Ax* or y* = A*x. The noise n is usually ignored during the design process. Typical design problems include finding suitable geometric parameters during fiber structure design and shaping signals to produce target temporal and spatial characteristics. In some special cases, the target y* is too ideal and cannot be achieved because of physical theories or the restricted experimental condition and can only find one close to it.

It should be noted that the forward operator, A, can be completely known, partly known, or unknown in different applications. When A is well known, some conventional methods can transfer the inverse problem to an optimization problem and solve it with an iterative process. For each y*, a similar operation needs to be solved from scratch. However, this scheme is weak or cannot work when the forward operator, A, is complex, requiring a time-consuming calculation procedure, partly known or even totally unknown. Machine learning is a powerful tool to solve inverse problems, simply relying on learning the inverse mapping A−1 and then obtaining a solution x* = A−1 (y*) in a single step. Further, additional feedback and control can help to improve the result accuracy, and a well-trained model can accelerate this process by replacing complex computation in A.

Control

When there is a high requirement for control accuracy and speed because of dynamical environmental disturbance, a feedback loop and the corresponding control unit are required to follow the specific change. Learning and optimization are two primary means to affect robustness. They usually involve computational processes incorporated within the system that trigger parametric updating and knowledge or model enhancement, improving progressively. Machine learning provides new insights for feedback and control [77, 78], particularly in the dynamic, complex, and disturbance-sensitive system, where conventional control algorithm shows low control bandwidth and weak robustness. An exciting discovery in published literature is that learning models can automatically reject instrumental or environmental noise. Some applications combine machine learning with traditional algorithms to enhance performance [79, 80].

Denoising

This part has a tight relationship with image and signal processing. Machine learning techniques can overcome data error to some extent, such as removing bad points and blur in the raw data [81, 82] and completing tasks when the measurement device yields strong noise [83]. The denoising ability of machine learning is significant in many practical applications.

Other applications

Machine learning can be used to reduce manual engineering in experimental operations of laboratories by modifying the hardware, such as the alignment of laser beams [84].

【本文地址】

公司简介

联系我们